132 research outputs found

    Onboard sampling of the rockfish and lingcod commerical passenger fishing vessel industry in northern and central California, January through December 1993

    Get PDF
    The Central California Marine Sport Fish Project has been collecting angler catch data on board Commercial Passenger Fishing Vessels (CPFVs) fishing for rockfish or lingcod since 1987. The program depends on the voluntary cooperation of CPFV owners and operators. This third report in a series presents data collected in 1993, refers to historical data from 1987 to 1992, and documents trends in species composition, angler effort, catch per unit effort (CPUE), and, for selected species, mean length and length frequency. Angler catches on board central and northern California CPFVs were sampled from 15 ports, ranging from Crescent City in the north to Port San Luis (Avila Beach) in the south. Technicians observed a total of 2385 anglers fishing on 248 CPFV trips. These observed anglers caught 29,622 fish of which Technicians determined 27,421 were kept. Over 60% of these fish were caught at Monterey or Morro Bay area ports. Only 18 of the 58 species each comprised at least one percent of the catch. The top ten species in order of abundance were blue, yellowtail, chilipepper, rosy, widow, canary, greenspotted, bocaccio, and vermilion rockfishes and lingcod. Blue and yellowtail rockfishes, and chilipepper, together comprised over 50% of the observed catch. Overall, rockfishes represented 35 species or 59% of the 58 identified species. In general, 1993 data indicated that in all port areas CPFV fishery resources, with a few exceptions, were in a viable and sustainable condition, similar to the previous 6 years. This study identified nine species, lingcod and eight rockfishes, with areas of concern which were primarily port-specific. Six of these ranked among the 10 most frequently observed species, five were schooling or migratory species, two were nearshore species, and three were offshore species. Trends of most concern continue to be declining catch per angler hour (CPAH) - of yellowtail rockfish in the Bodega Bay area, lingcod in shallow locations near the Monterey area, and yelloweye rockfish in the San Francisco area, as well as decreasing mean lengths of canary rockfish in the Monterey area and brown rockfish in the Morro Bay area. Populations of black rockfish, the species presently of greatest concern in the CPFV fishery, showed some positive signs this year. Also on the positive side, the Monterey and Morro Bay areas experienced an increased availability of newly-recruited smaller, juvenile vermilion rockfish in observed catches. Total catch estimates were within values observed in previous years. (132pp.

    Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

    Get PDF
    Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data

    Onboard sampling of the rockfish and lingcod commerical passenger fishing vessel industry in northern and central California, 1992

    Get PDF
    In 1992 fishery technicians sampled 230 commercial passenger fishing vessel (CPFV) trips targeting rockfish and lingcod from the port areas of Fort Bragg, Bodega Bay, San Francisco, Monterey, and Morro Bay. The skippers of 44 vessels, and 2,190 anglers, cooperated in the study. Species composition by port area and month, catch-per-unit-effort, mean length, and length frequency of lingcod and the 18 most frequently observed rockfish species are presented, as well as fishing effort relative to time, depth, and distance from port. Total catch estimates based on unadjusted and adjusted logbook records are summarized. Average catch of kept fish per angler day was 12.6 and average catch of kept fish per angler hour was 4.0. A continuing trend of an increasing frequency of trips to deep (> 40 fm) locations was observed in the Bodega Bay, San Francisco, and Monterey areas. Bodega Bay and San Francisco showed the highest frequency of trips to distant locations. Sixty species comprised of 29,731 fish were observed caught during the study. Rockfish comprised 93.5% by number of the total observed catch. The five most frequently observed species were blue, yellowtail, widow and rosy rockfishes, and bocaccio, with lingcod ranking eighth. CPFV angler success, as determined by catch per angler hour, generally increased in all ports in 1992 compared to previous 1988-91 data (Reilly et al. 1993). However, port-specific areas of major concern were identified for chilipepper, lingcod, and black rockfish, and to a lesser extent brown, canary, vermilion, yelloweye, widow and greenspotted rockfishes. These areas of concern included steadily declining catch rate, steadily declining mean length, and/or a high percentage of sexually immature fish in the sampled catch. Recent sampling of the commercial hook-and-line fishery in northern and central California indicates that most rockfishes taken by CPFV anglers are also harvested commercially. (105pp.

    Dimension-adaptive bounds on compressive FLD Classification

    Get PDF
    Efficient dimensionality reduction by random projections (RP) gains popularity, hence the learning guarantees achievable in RP spaces are of great interest. In finite dimensional setting, it has been shown for the compressive Fisher Linear Discriminant (FLD) classifier that forgood generalisation the required target dimension grows only as the log of the number of classes and is not adversely affected by the number of projected data points. However these bounds depend on the dimensionality d of the original data space. In this paper we give further guarantees that remove d from the bounds under certain conditions of regularity on the data density structure. In particular, if the data density does not fill the ambient space then the error of compressive FLD is independent of the ambient dimension and depends only on a notion of ‘intrinsic dimension'

    Solving kk-means on High-dimensional Big Data

    Full text link
    In recent years, there have been major efforts to develop data stream algorithms that process inputs in one pass over the data with little memory requirement. For the kk-means problem, this has led to the development of several (1+ε)(1+\varepsilon)-approximations (under the assumption that kk is a constant), but also to the design of algorithms that are extremely fast in practice and compute solutions of high accuracy. However, when not only the length of the stream is high but also the dimensionality of the input points, then current methods reach their limits. We propose two algorithms, piecy and piecy-mr that are based on the recently developed data stream algorithm BICO that can process high dimensional data in one pass and output a solution of high quality. While piecy is suited for high dimensional data with a medium number of points, piecy-mr is meant for high dimensional data that comes in a very long stream. We provide an extensive experimental study to evaluate piecy and piecy-mr that shows the strength of the new algorithms.Comment: 23 pages, 9 figures, published at the 14th International Symposium on Experimental Algorithms - SEA 201

    Incremental dimension reduction of tensors with random index

    Get PDF
    We present an incremental, scalable and efficient dimension reduction technique for tensors that is based on sparse random linear coding. Data is stored in a compactified representation with fixed size, which makes memory requirements low and predictable. Component encoding and decoding are performed on-line without computationally expensive re-analysis of the data set. The range of tensor indices can be extended dynamically without modifying the component representation. This idea originates from a mathematical model of semantic memory and a method known as random indexing in natural language processing. We generalize the random-indexing algorithm to tensors and present signal-to-noise-ratio simulations for representations of vectors and matrices. We present also a mathematical analysis of the approximate orthogonality of high-dimensional ternary vectors, which is a property that underpins this and other similar random-coding approaches to dimension reduction. To further demonstrate the properties of random indexing we present results of a synonym identification task. The method presented here has some similarities with random projection and Tucker decomposition, but it performs well at high dimensionality only (n>10^3). Random indexing is useful for a range of complex practical problems, e.g., in natural language processing, data mining, pattern recognition, event detection, graph searching and search engines. Prototype software is provided. It supports encoding and decoding of tensors of order >= 1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure

    Toward criteria for pragmatic measurement in implementation research and practice: a stakeholder-driven approach using concept mapping

    Get PDF
    Background: Advancing implementation research and practice requires valid and reliable measures of implementation determinants, mechanisms, processes, strategies, and outcomes. However, researchers and implementation stakeholders are unlikely to use measures if they are not also pragmatic. The purpose of this study was to establish a stakeholder-driven conceptualization of the domains that comprise the pragmatic measure construct. It built upon a systematic review of the literature and semi-structured stakeholder interviews that generated 47 criteria for pragmatic measures, and aimed to further refine that set of criteria by identifying conceptually distinct categories of the pragmatic measure construct and providing quantitative ratings of the criteria’s clarity and importance. Methods: Twenty-four stakeholders with expertise in implementation practice completed a concept mapping activity wherein they organized the initial list of 47 criteria into conceptually distinct categories and rated their clarity and importance. Multidimensional scaling, hierarchical cluster analysis, and descriptive statistics were used to analyze the data. Findings: The 47 criteria were meaningfully grouped into four distinct categories: (1) acceptable, (2) compatible, (3) easy, and (4) useful. Average ratings of clarity and importance at the category and individual criteria level will be presented. Conclusions: This study advances the field of implementation science and practice by providing clear and conceptually distinct domains of the pragmatic measure construct. Next steps will include a Delphi process to develop consensus on the most important criteria and the development of quantifiable pragmatic rating criteria that can be used to assess measures

    Non-Redundant Spectral Dimensionality Reduction

    Full text link
    Spectral dimensionality reduction algorithms are widely used in numerous domains, including for recognition, segmentation, tracking and visualization. However, despite their popularity, these algorithms suffer from a major limitation known as the "repeated Eigen-directions" phenomenon. That is, many of the embedding coordinates they produce typically capture the same direction along the data manifold. This leads to redundant and inefficient representations that do not reveal the true intrinsic dimensionality of the data. In this paper, we propose a general method for avoiding redundancy in spectral algorithms. Our approach relies on replacing the orthogonality constraints underlying those methods by unpredictability constraints. Specifically, we require that each embedding coordinate be unpredictable (in the statistical sense) from all previous ones. We prove that these constraints necessarily prevent redundancy, and provide a simple technique to incorporate them into existing methods. As we illustrate on challenging high-dimensional scenarios, our approach produces significantly more informative and compact representations, which improve visualization and classification tasks

    Fast Label Embeddings via Randomized Linear Algebra

    Full text link
    Many modern multiclass and multilabel problems are characterized by increasingly large output spaces. For these problems, label embeddings have been shown to be a useful primitive that can improve computational and statistical efficiency. In this work we utilize a correspondence between rank constrained estimation and low dimensional label embeddings that uncovers a fast label embedding algorithm which works in both the multiclass and multilabel settings. The result is a randomized algorithm whose running time is exponentially faster than naive algorithms. We demonstrate our techniques on two large-scale public datasets, from the Large Scale Hierarchical Text Challenge and the Open Directory Project, where we obtain state of the art results.Comment: To appear in the proceedings of the ECML/PKDD 2015 conference. Reference implementation available at https://github.com/pmineiro/randembe
    corecore